AITopics | hardware target

Collaborating Authors

hardware target

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LightCode: Compiling LLM Inference for Photonic-Electronic Systems

Tomich, Ryan, Zhong, Zhizhen, Englund, Dirk

arXiv.org Artificial IntelligenceSep-23-2025

The growing demand for low-latency, energy-efficient inference in large language models (LLMs) has catalyzed interest in heterogeneous architectures. While GPUs remain dominant, they are poorly suited for integration with emerging domain-specific accelerators like the Photonic Tensor Units (PTUs), which offer low-power, high-throughput linear computation. This motivates hybrid compilation strategies that combine photonic and electronic resources. We present LightCode, a compiler framework and simulator for mapping LLM inference workloads across hybrid photonic-electronic systems. LightCode introduces the Stacked Graph, an intermediate representation that encodes multiple hardware-specific realizations of each tensor operation. Hardware assignment is formulated as a constrained subgraph selection problem optimized for latency or energy under parametric cost models. We evaluate LightCode on the prefill stage of GPT-2 and Llama-7B showing that under our workload and hardware assumptions, (i) Photonic hardware reduced energy by up to 50% in our simulated workloads at maximum sequence length; (ii) multiplexing and assignment strategy yielded latency improvements exceeding 10x; and (iii) Optimizing for latency or energy resulted in distinct hardware mappings in our simulations. LightCode offers a module, foundational framework and simulator for compiling LLMs to emerging photonic accelerators.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.16443

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI

Jiang, Zijun, Lyu, Yangdi

arXiv.org Artificial IntelligenceAug-14-2025

--Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. T o further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision quantization (MPQ) becomes a popular solution. However, existing algorithms for exploring MPQ schemes are limited in flexibility and efficiency. Comprehending the complex impacts of different MPQ schemes on post-training quantization and quantization-aware training results is a challenge for conventional methods. Furthermore, an end-to-end framework for the optimization and deployment of MPQ models is missing in existing work. In this paper, we propose the MiCo framework, a holistic MPQ exploration and deployment framework for edge AI applications. The framework adopts a novel optimization algorithm to search for optimal quantization schemes with the highest accuracies while meeting latency constraints. Hardware-aware latency models are built for different hardware targets to enable fast explorations. After the exploration, the framework enables direct deployment from PyT orch MPQ models to bare-metal C codes, leading to end-to-end speedup with minimal accuracy drops. Tiny machine learning (ML) and edge artificial intelligence (AI) are becoming increasingly important and valuable in today's AI ecosystem. However, deploying AI models on edge devices is challenging due to the tight resource constraints.

accuracy, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.095

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Simplify deploying YOLOv5 to using new OctoML CLI

#artificialintelligenceJan-13-2023, 05:15:09 GMT

Follow along with our new YOLOv5 deployment tutorial to power your next object detection application. Or, watch this tutorial video by Smitha Kolan on how to deploy YOLOV5 in under 15 minutes using the OctoML CLI. Today, we are excited to announce the results of our collaboration with Ultralytics to deploy the YOLOv5 models to over 100 CPU and GPU hardware targets in AWS, Azure and GCP. Our engineering work with Ultralytics unlocks the ability to deploy YOLOv5 models on hardware from Intel, NVIDIA, Arm and AWS, with minimal effort and cost. In this blog, I'll show you how simple it is to achieve hardware independence and cost savings across multiple clouds.

artificial intelligence, container, hardware target, (13 more...)

#artificialintelligence

Genre: Instructional Material (0.35)

Industry: Information Technology (0.71)

Technology: Information Technology > Artificial Intelligence > Vision (0.54)

Add feedback

The Next Big Programming Language You've Never Heard Of

#artificialintelligenceJul-19-2022, 04:53:34 GMT

At the International Conference on Programming Language Design and Implementation (2022), scientists from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) published a research paper titled, 'Exocompilation for productive programming of hardware accelerators' that proposes a new programming language, 'Exo', which can be used for writing high-performance code on hardware accelerators. Exo is a domain-specific programming language that helps low-level performance engineers transform very simple programs which specify what they want to compute into very complex programs that do the same thing as the specification but much faster. It is both a programming language and a compiler and allows custom hardware instructions, specialised memories and accelerator configuration states to be defined in user libraries. Exo builds on the idea of user scheduling to externalise hardware mapping and optimisation decisions. Accelerators like GPUs and image signal processors play an increasingly important role in modern computer systems.

accelerator, hardware, performance engineer, (14 more...)

#artificialintelligence

Country: North America > United States (0.06)

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

AI design changes on the horizon from open-source Apache TVM and OctoML

#artificialintelligenceNov-26-2021, 17:55:19 GMT

In recent years, artificial intelligence programs have been prompting changes in computer chip designs, and novel computers have made new kinds of neural networks in AI possible. There is a powerful feedback loop going on. In the center of that loop sits software technology that converts neural net programs to run on novel hardware. And at the center of that sits a recent open-source project gaining momentum. Apache TVM is a compiler that operates differently from other compilers.

ceze, hardware, octoml, (11 more...)

#artificialintelligence

Country: Europe > United Kingdom (0.05)

Industry:

Semiconductors & Electronics (0.50)
Banking & Finance (0.48)
Information Technology > Hardware (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

'Octomize' Your ML Code

#artificialintelligenceApr-4-2021, 14:55:26 GMT

If you're spending months hand-tuning your machine learning model to run well on a particular type of processor, you might be interested in a startup called OctoML, which recently raised $28 million to bring its innovative "Octomizer" to market. Octomizer is the commercial version of Apache TVM, an open source compiler that was created in Professor Luiz Ceze's research project in the Computer Science Department at the University of Washington. Datanami recently caught up with the professor–who is also the CEO of OctoML–to learn about the state of machine learning model compilation in a rapidly changing hardware world. According to Ceze, there is big gap in the MLOps workflow between the completion of the machine learning model by the data scientist or machine learning engineer, and deployment of that model into the real world. Quite often, the services of a software engineer are required to convert the ML model, which is often written in Python using one of the popular frameworks like TensorFlow or PyTorch, into highly optimized C or C that can run on a particular processor.

ceze, hardware target, octomizer, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

plaidml/plaidml

#artificialintelligenceJan-19-2020, 16:31:27 GMT

This will act as our development branch going forward and will allow us to more rapidly prototype the changes we're making without breaking our existing user base. As a precaution, please note that certain features, tests, and hardware targets may be broken in plaidml-v1. You can continue to use code on the master branch or from our releases on PyPI. For your convenience, the contents of our master branch will be released as version 0.7.0. We are keeping the master branch of PlaidML stable and maintaining it until plaidml-v1 is ready for production.

hardware, plaidml, plaidml plaidml, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

pytorch/glow

@machinelearnbotMay-3-2018, 19:21:38 GMT

Glow is a machine learning compiler and execution engine for various hardware targets. It is designed to be used as a backend for high-level machine learning frameworks. The compiler is designed to allow state of the art compiler optimizations and code generation of neural network graphs. This library is experimental and in active development. Glow lowers a traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation (IR).

artificial intelligence, deep learning, machine learning, (17 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback